Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Arabic font recognition based on diacritics features

Identifieur interne : 000134 ( Main/Exploration ); précédent : 000133; suivant : 000135

Arabic font recognition based on diacritics features

Auteurs : Mohammed Lutf [République populaire de Chine] ; XINGE YOU [République populaire de Chine] ; Yiu-Ming Cheung [République populaire de Chine] ; C. L. Philip Chen [République populaire de Chine]

Source :

RBID : Pascal:13-0366130

Descripteurs français

English descriptors

Abstract

Many methods have been proposed for Arabic font recognition, but none of them has considered the specialty of the Arabic writing system. Most of these methods are either general pattern recognition approaches or application of other methods which have been developed for languages other than Arabic. Therefore, this paper is the first attempt to present an alternative method for Arabic font recognition based on diacritics. It presents the diacritics as the thumb of Arabic fonts which can be used individually to identify and recognize the font type. Diacritics are the marks and strokes which have been added to the original Arabic alphabet. Though they are the smallest regions in the Arabic script, with today technology it is very easy to get a high resolution image with a very low cost. In this kind of images, the diacritics can reveal very useful information about the font type. In this study, two algorithms for diacritics segmentation have been developed, namely flood-fill based and clustering based algorithm. The experiments conducted proved that our approach can achieve an average recognition rate of 98.73% on a typical database that contains 10 of the most popular Arabic fonts. Compared with existing methods, our approach has the minimum computation cost and it can be integrated with OCR systems very easily. Moreover, it could recognize the font type regardless of the amount of the input data since five diacritics, which in most cases can be found in only one word, are enough for font recognition.


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en" level="a">Arabic font recognition based on diacritics features</title>
<author>
<name sortKey="Lutf, Mohammed" sort="Lutf, Mohammed" uniqKey="Lutf M" first="Mohammed" last="Lutf">Mohammed Lutf</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Department of Electronics and Information Engineering, Huazhong University of Science and Technology</s1>
<s2>Wuhan</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>République populaire de Chine</country>
<wicri:noRegion>Wuhan</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Xinge You" sort="Xinge You" uniqKey="Xinge You" last="Xinge You">XINGE YOU</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Department of Electronics and Information Engineering, Huazhong University of Science and Technology</s1>
<s2>Wuhan</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>République populaire de Chine</country>
<wicri:noRegion>Wuhan</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Cheung, Yiu Ming" sort="Cheung, Yiu Ming" uniqKey="Cheung Y" first="Yiu-Ming" last="Cheung">Yiu-Ming Cheung</name>
<affiliation wicri:level="1">
<inist:fA14 i1="02">
<s1>Department of Computer Science, Hong Kong Baptist University</s1>
<s2>Hong Kong</s2>
<s3>CHN</s3>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>République populaire de Chine</country>
<wicri:noRegion>Hong Kong</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Chen, C L Philip" sort="Chen, C L Philip" uniqKey="Chen C" first="C. L. Philip" last="Chen">C. L. Philip Chen</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Department of Electronics and Information Engineering, Huazhong University of Science and Technology</s1>
<s2>Wuhan</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>République populaire de Chine</country>
<wicri:noRegion>Wuhan</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1">
<inist:fA14 i1="03">
<s1>Faculty of Science of Technology, University of Macau</s1>
<s3>CHN</s3>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>République populaire de Chine</country>
<wicri:noRegion>Faculty of Science of Technology, University of Macau</wicri:noRegion>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">INIST</idno>
<idno type="inist">13-0366130</idno>
<date when="2014">2014</date>
<idno type="stanalyst">PASCAL 13-0366130 INIST</idno>
<idno type="RBID">Pascal:13-0366130</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000037</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000731</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000017</idno>
<idno type="wicri:doubleKey">0031-3203:2014:Lutf M:arabic:font:recognition</idno>
<idno type="wicri:Area/Main/Merge">000135</idno>
<idno type="wicri:Area/Main/Curation">000134</idno>
<idno type="wicri:Area/Main/Exploration">000134</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a">Arabic font recognition based on diacritics features</title>
<author>
<name sortKey="Lutf, Mohammed" sort="Lutf, Mohammed" uniqKey="Lutf M" first="Mohammed" last="Lutf">Mohammed Lutf</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Department of Electronics and Information Engineering, Huazhong University of Science and Technology</s1>
<s2>Wuhan</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>République populaire de Chine</country>
<wicri:noRegion>Wuhan</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Xinge You" sort="Xinge You" uniqKey="Xinge You" last="Xinge You">XINGE YOU</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Department of Electronics and Information Engineering, Huazhong University of Science and Technology</s1>
<s2>Wuhan</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>République populaire de Chine</country>
<wicri:noRegion>Wuhan</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Cheung, Yiu Ming" sort="Cheung, Yiu Ming" uniqKey="Cheung Y" first="Yiu-Ming" last="Cheung">Yiu-Ming Cheung</name>
<affiliation wicri:level="1">
<inist:fA14 i1="02">
<s1>Department of Computer Science, Hong Kong Baptist University</s1>
<s2>Hong Kong</s2>
<s3>CHN</s3>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>République populaire de Chine</country>
<wicri:noRegion>Hong Kong</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Chen, C L Philip" sort="Chen, C L Philip" uniqKey="Chen C" first="C. L. Philip" last="Chen">C. L. Philip Chen</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Department of Electronics and Information Engineering, Huazhong University of Science and Technology</s1>
<s2>Wuhan</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>République populaire de Chine</country>
<wicri:noRegion>Wuhan</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1">
<inist:fA14 i1="03">
<s1>Faculty of Science of Technology, University of Macau</s1>
<s3>CHN</s3>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>République populaire de Chine</country>
<wicri:noRegion>Faculty of Science of Technology, University of Macau</wicri:noRegion>
</affiliation>
</author>
</analytic>
<series>
<title level="j" type="main">Pattern recognition</title>
<title level="j" type="abbreviated">Pattern recogn.</title>
<idno type="ISSN">0031-3203</idno>
<imprint>
<date when="2014">2014</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt>
<title level="j" type="main">Pattern recognition</title>
<title level="j" type="abbreviated">Pattern recogn.</title>
<idno type="ISSN">0031-3203</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Algorithm</term>
<term>Alternative method</term>
<term>Arabic</term>
<term>Arabic alphabet</term>
<term>Automatic classification</term>
<term>Composite material</term>
<term>Cost lowering</term>
<term>Database</term>
<term>High resolution</term>
<term>Integrated system</term>
<term>Optical character recognition</term>
<term>Pattern recognition</term>
<term>Segmentation</term>
<term>Signal classification</term>
<term>Useful information</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr">
<term>Arabe</term>
<term>Reconnaissance forme</term>
<term>Méthode alternative</term>
<term>Haute résolution</term>
<term>Diminution coût</term>
<term>Information utile</term>
<term>Algorithme</term>
<term>Segmentation</term>
<term>Classification automatique</term>
<term>Base de données</term>
<term>Système intégré</term>
<term>Reconnaissance optique caractère</term>
<term>Matériau composite</term>
<term>Classification signal</term>
<term>Alphabet arabe</term>
</keywords>
<keywords scheme="Wicri" type="topic" xml:lang="fr">
<term>Base de données</term>
<term>Matériau composite</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Many methods have been proposed for Arabic font recognition, but none of them has considered the specialty of the Arabic writing system. Most of these methods are either general pattern recognition approaches or application of other methods which have been developed for languages other than Arabic. Therefore, this paper is the first attempt to present an alternative method for Arabic font recognition based on diacritics. It presents the diacritics as the thumb of Arabic fonts which can be used individually to identify and recognize the font type. Diacritics are the marks and strokes which have been added to the original Arabic alphabet. Though they are the smallest regions in the Arabic script, with today technology it is very easy to get a high resolution image with a very low cost. In this kind of images, the diacritics can reveal very useful information about the font type. In this study, two algorithms for diacritics segmentation have been developed, namely flood-fill based and clustering based algorithm. The experiments conducted proved that our approach can achieve an average recognition rate of 98.73% on a typical database that contains 10 of the most popular Arabic fonts. Compared with existing methods, our approach has the minimum computation cost and it can be integrated with OCR systems very easily. Moreover, it could recognize the font type regardless of the amount of the input data since five diacritics, which in most cases can be found in only one word, are enough for font recognition.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>République populaire de Chine</li>
</country>
</list>
<tree>
<country name="République populaire de Chine">
<noRegion>
<name sortKey="Lutf, Mohammed" sort="Lutf, Mohammed" uniqKey="Lutf M" first="Mohammed" last="Lutf">Mohammed Lutf</name>
</noRegion>
<name sortKey="Chen, C L Philip" sort="Chen, C L Philip" uniqKey="Chen C" first="C. L. Philip" last="Chen">C. L. Philip Chen</name>
<name sortKey="Chen, C L Philip" sort="Chen, C L Philip" uniqKey="Chen C" first="C. L. Philip" last="Chen">C. L. Philip Chen</name>
<name sortKey="Cheung, Yiu Ming" sort="Cheung, Yiu Ming" uniqKey="Cheung Y" first="Yiu-Ming" last="Cheung">Yiu-Ming Cheung</name>
<name sortKey="Xinge You" sort="Xinge You" uniqKey="Xinge You" last="Xinge You">XINGE YOU</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000134 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000134 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     Pascal:13-0366130
   |texte=   Arabic font recognition based on diacritics features
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024